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Indirect Rate-Distortion Function of a Binary i.i.d Source 

Alon Kipnis*, Stefano Rini^^ and Andrea J. Goldsmith* 

Abstract 

The indirect source-coding problem in which a Bernoulli process is compressed in a lossy manner from its noisy observations 
is considered. These noisy observations are obtained by passing the source sequence through a binary symmetric channel so that 
the channel crossover probability controls the amount of information available about the source realization at the encoder. We use 
classic results in rate-distortion theory to compute an expression of the rate-distortion function for this model, where the Bernoulli 
source is not necessarily symmetric. The indirect rate-distortion function is given in terms of a solution to a simple equation. In 
addition, we derive an upper bound on the indirect rate-distortion function which is given in a closed. These expressions capture 
precisely the expected behavior that the noisier the observations, the smaller the return from increasing bit-rate to reduce distortion. 

Index Terms 

Indirect rate distortion problem; Binary source; Binary symmetric channel; 

I. INTRODUCTION 

The optimal trade-off between bit-rate and average distortion in the representation of an information source is given by the 
Rate-Distortion Function (RDF); the RDF provides the minimum rate necessary to describe a source when its reconstruction 
is allowed to be to within a given average distortion from the original sequence. A natural extension of this source coding 
problem is the scenario in which the encoder cannot observe the source directly but obtains only noisy observations. This 
could be due to a number of phenomena such as environmental noise, finite precision quantization and sub-sampling [1]. In 
this setup, the encoder is required to describe the source from another process statistically correlated with the source itself: 
this problem is known as indirect or remote source coding [2, Sec. 3.5]. 

An interesting motivation for the indirect source coding problem arises in centralized sensing networks in which each sensor 
in the network is required to transmit its observation to a remote processing unit. Restrictions on the computational complexity 
and power consumption of the sensors make local processing infeasible and thus the uncompressed data has to be communicated 
over the network. The communication toward the central unit introduces noise in the sensors’ observations and the compression 
rate of the data acquired at the central node is determined by the indirect RDF. 

The general structure of an indirect source coding problem is depicted in Figure 1: the source process, A”, is passed through 
the noisy channel Py\x obtain the signal U". The encoder compresses the sequence U" at rate R and the compressed 
observation is provided noiselessly to the decoder. The receiver produces the sequence A" which is a reconstruction of the 
original signal A” to within a prescribed average distortion. 

While in the direct source coding problem the RDF describes the optimal trade-off between the code rate R and distortion D, 
another quantity of merit in the indirect problem is the channel Py\x- By characterizing the trade-off in the indirect problem, 
namely by an indirect RDF, it is possible to study the effect of the channel quality on the optimal rate-distortion trade-off. For 
instance, it is of interest to characterize the amount of additional code-rate needed to maintain a fixed distortion level as the 
observations become noisier. 

It has long been noticed [2], [3], [4] that an indirect source coding problem can be reduced to a standard source coding 
problem by the following argument: it is possible to consider the observable process A" as the source in the standard source 
coding problem by amending the fidelity criterion to capture the distance between the reconstructed symbol A" and all 
possible realizations of the original source realization A" weighed according to the probability of their appearance given 
A". A particularly intuitive form of this observation appears in the case of a quadratic distortion, where the amended fidelity 
criterion can be decomposed as the sum of two terms: (i) the mean squared error (MSE) estimation of the source from its 
observation plus (ii) the error in describing the MSE estimate under a rate-limited description [4]. This separation allows one 
to obtain the closed form expression of the indirect RDE in the Gaussian source, quadratic distortion and additive Gaussian 
noise case [5], [1]. 

While, in general, similar separation results for other models do not exist, it may still be possible to solve the direct problem 
using the amended distortion measure. This approach is explored in this paper for the important case of a binary i.i.d source, 
bit flipping noise and the Hamming distortion. 

Related Work: The source coding problem was first introduced by Shannon in [6] while he provided the first of the source 
coding theorem in [7]. Indirect rate-distortion problem was first introduced by Dobrushin and Tsybakov in [5]. The authors of 
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Fig. 1; Indirect source coding model 


[5] derived a closed form solution for the indirect RDF in the Gaussian stationary case and, implicitly, showed an equivalence 
of the indirect problem to a direct source coding problem with an amended hdelity criterion. Wolf and Ziv [4] showed that, in 
the case of a quadratic distortion, the new hdelity criterion identihed in [3] decomposes into the sum of two terms, only one 
of which depends on the source coding rate R. Wolf and Ziv also computed the indirect RDF (iDRF) in various cases include 
the case of a Bernoulli source observed through a binary symmetric channel under quadratic distortion. Since the quadratic 
distortion of a binary sequence is not larger than its Hamming distance, their result provides a lower bound on the iRDF of the 
same source under the Hamming distance considered in this work. Berger [2] noted the equivalence of the indirect problem to a 
modihed direct problem with a new hdelity criterion, and gave an interpretation of the new hdelity criterion as the conditional 
expectation of the the original distortion measure given the source and noise realizations. 

In the special case of a Bernoulli observed through a binary channel, the computation of the iRDF is greatly simplihed 
when the source is symmetric = 1) = 1/2 [2, Exc. 3.8]. In our setting where the observation are given by a binary 

symmetric channel, this iDRF is given by 


Rx\y{D) 



p< D <1/2 
D > 1/2, 


( 1 ) 


where h{x) is the binary entropy function and p < 1/2 and p = 1 — p (the case p > 1/2 can be treated in a similar fashion). 
The symmetric case can also be obtained as a special case from [8] where indirect versions of the multiterminal setting of 
Slepian-Wolf and Wyner-Ziv problems were considered. 

Contributions: We derive an expression for the iRDF of a Bernoulli process with F{Xn = 1) = a given its observation 
y" through a binary symmetric channel with crossover probability p for the general case of a G [0,1/2). This iRDF Rx \y{D) 
is obtained by finding the unique root to an equation whose parameters are determined by a, p and D. Additionally, we show 
that an upper bound on Rx\y{D) expressed as (for p < 0.5) 


Rx\y{D) 


h{axp) - P<D <1/2 

0 D > a, 


( 2 ) 


where a-kp = pa + ap with equality if and only if a = 1/2, in which case Rx\y{D) = Rx\y{D) for all D. 

The rest of this paper is organized as follows; the indirect source coding problem and the relevant background literature are 
introduced in Sec. II. The main results are derived in Sec. III. Finally, Sec. IV concludes the paper. 


II. Problem Statement 

We consider the indirect source coding problem depicted in Fig. 1: an encoder observes the discrete time process through 
the noisy channel and produces a sequence of coded symbols at rate R. From this sequence of coded symbols, the 

decoder produces a reconstructed sequence JV" which must be to within maximum average distortion from for a prescribed 
fidelity criterion. 

More specihcally, given a source sequence = {Xk^ fc = 1, 2 ... n} with alphabet A", the encoder is provided with the 
sequence V" with alphabet V" obtained from AT" through the channel Pyn|xri(y"|Ai") and maps this sequence unto the set 
{1... 2 } through the mapping 

W(Y^): V” ^ {1--.2L^"J|. (3) 

The value W{Y'^) is noiselessly communicated to the receiver which, in turns, produces the sequence X" with alphabet A" 
through the mapping 

X^{W): |i...2L^"J| (4) 

The sequence AT" must be to within a distortion D from Ai" for some chosen fidelity criterion dn{x^,x^) which is measured 
with the per-letter distortion function d{xi,Xi), as 

n 

= '^d{xi,Xi), 

i=l 


( 5 ) 











for some real-valued, bounded function 

The operational indirect RDF Rx\y{D) is defined as the minimal rate R in (3) and (4) such that the average distortion 
between X" and X" in (5) does not exceed D, as the block-length n goes to infinity. 

The indirect (Shannon’s) RDF (iRDF) for the channel Ryn^x" is defined as 

Rx\y{D) = \imM Rn{D), 

n—^oc 

where 

R^{D) = inf (y^; X”) < R, 

and the infimum is taken over all mappings F" X" = (3) o (4) such that the average distortion between X" and X" is at 
most D. 


The customary source coding problem [7], also direct source coding problem, is obtained from the indirect source coding 
problem by simply letting . It is noted in [3] that the problem of finding the operational indirect source coding rate 

Rx\y{D) can be reduced to a direct source coding problem for the observable process F" and a different distortion measure 
d{-, •) defined as 

^ E[d„(X",i”)|F" = j/”]. (6) 


Note that d{-, ■) depends only on d{-, •) and Pyn^x’^, which are determined by the structure of the original indirect rate distortion 
problem. 

Since 



it follows that i?x|y(^) equals the (direct) RDF Ry{D) of the process F^ under the fidelity criterion d{-,-). Shannon’s 
source coding theorem [7] now implies 


Rx\y{D) = Ry{D) = Rx\y{D). 


(7) 


The reduction of the indirect source coding problem to a direct problem under d(-, •) also provides us with an approach to 
solve the indirect problem. Namely, one can compute the direct distortion d{-, •) and compute the RDF for the source F^ 
under d{-, •). 


A. Relevant results 

The computation of a direct RDF Ru{D) of a source U over a discrete alphabet U is performed by minimizing the mutual 
information over the set of transition probabilities 

P{u\u) = P({7 = u\U = u), 


under the constraint 

EE Q{u)P{u\u)d{u,u) < D, 


where Q{u) = F{U 
Lagrangian 


u) and d{-, •) is the per-letter distortion measure. This is equivalent to finding a stationary point to the 

P{u\u) 


Lo{r, P) = E Q{u)Piu\u) 


log 


Y.u.uQiu)P{u\u) 


+ r{d{u, u) — D) 


( 8 ) 


over the set of all transition probabilities. By introducing the constraint on the transition probabilities and 
dual of (8), Gallager proved in [9] the theorem below. 

Theorem II.l. [9, Thm. 9.4.1] For a given source entropy H(U) and a given distortion measure d{-, ■), 


i?o(r,P) 

U,U 




rd{u, u) , 


using the Lagrange 
let 


then for any r > 0, 

mini?o(r, P) = H{U) + max^^ Qiu) In /„, 

U 


( 9 ) 
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Fig. 2: Equivalent descriptions of the channel Py\x- 


where the minimization in the LHS of (9) is over all transition probability functions P = |P(m|u), u GU, u GU j, and the 
maximization in the RHS of (9) is over all { = {fu, u G U} with non-negative components satisfying the constraints 


<1, uGU. (10) 


Necessary and sufficient conditions on f to achieve the maximum in (9) are the existence of a set of non-negative numbers 
|w('u), u G satisfying 


1 = 


fu 

Q{u) 


u^U 


( 11 ) 


and that (10) is satisfied with equality for each u with w{u) > 0. 

It follows from (8) that if the conditions for equality in Theorem II. 1 hold, we have 

Ru{D) = mini?o(?’) P) = H{U) + max^ (3(u) In fu- 

U 

We refer to [10] for a discussion of Theorem II. 1 in the context of convex optimization theory as well as a geometric 
programming representation of this problem. 


B. Indirect DRF of a binary i.i.d process 

We now specialize our study of the iRDF to the case where is an i.i.d binary process, F" is obtained by passing W" 
through a memoryless Binary Symmetric Channel (BSC) and for Hamming distortion measure. 

More specifically, we focus on the case where ,Xi Z Xj, i j and 

y" = X” © F”, 

where X'^ and F" are two Bernoulli i.i.d process, independent of each other, with P(Xj = 1) = a and P(Zi = 1) = p, MiG 
{0... n} respectively. Accordingly, X = y = {0,1} and Yi is a binary i.i.d process with 

/? = ¥(Yi = 1) = p-k a, ViG{l... n}. 

For the fidelity criterion at the receiver we consider the case X = {0,1} and 

diyCi-j xfj — Xi © , (12) 

which corresponds to the usual Hamming distance between x^ and x". 

Remark II.2. Given the symmetry in the source Xi and the noisy observations Yi, we can consider a,p < l/2.‘ the remaining 
cases can be obtained by complementing the observations F" and/or the reconstructions X”. 

In view of Remark II.2 we will assume a,p < 1/2 in the remainder of the paper. 

III. Results 

A. Preliminaries 

From the definition of the iRDF we can infer some properties of Rx\y{D) for the model in Fig. 2: 

Proposition III.l. The function Rx\y{D) must satisfy the following properties: 

(i) Rx\y{D) = 0 for any D > a. 

(ii) Rx\y{D) is only defined in the interval D > min{p, a}. 

(iii) Rx\y{D) is non-decreasing in p. 
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TABLE I; Possible values of d{yi,Xi) in (14) 


(iv) Rx\y{D) > Rx{D) for any D, where 


Rx{D) = 


h{a) — h{D), 0 < D < a, 


0, D > a. 

is the RDF of X under the Hamming distortion (see e.g. [11]) and corresponds to the case = X’'' 


( 13 ) 


Using the results in Section II, we can equate the indirect RDF Rx\YiD) to the (direct) RDF Ry{D) by defining the 
amended distortion measure df,-) in (6) obtained as 

n 

dn{y\ x") = ^ ^ (x, © Xi) P (X” = x"|y" = y"), 


=(xi ©xi)p(Xi=xiiFi=t/i) 

^—1 

n n 

= '^F{X^f^Xi\Y, = y,)='^d(y^,Xi). (14) 

i=l i=l 

It follows from (14) that the new distortion measure d{-,-) has an intuitive interpretation: if Xj € {0,1} is the estimate of 
Xi given the symbol yi € {0,1}, then d{yi,Xi) is the probability of making an error in this estimation. Table I lists all the 
possible values of d{yi,Xi). 

B. Main Result 

The next step is to use Theorem II.1 to derive 

Theorem III.2. Let 

p(r) 4r(L>-p)+log(l-e-’'(“+’')) (15) 

-^log(l-e-™) -/31og(l-e-™). 

The iRDF Rx\y{D) is given by 

^h(l3)-g{r*) p < D < a, 


RxiYiD) = 


D > min{Q;,p}, 


where r* is the unique solution to 


Pu Pv 


u + v 


— 1 g’’*!' _ \ gr*{u+v) _ ^ 


= D-p, 


(16) 


with u = (a — p)/P and v = [a — p)p. 


Proof: Only an outline of the proof is provided here: the full proof is provided in App. A. In view of Proposition III. 1 it 
is enough to consider the case p < D < a <\/2. Assume that equality holds in (10), then 

/g-^ (f^\ _ (I 

1 —r^ 


. g-Ti v/iy vv’ 


/o = 


/l = 


1 — e 


g-’’— _ f,-r(u+v)'^ 

1 - e"™ 

g-’-T (l - g-’'(i‘+«)) 


which implies 



where u= {a — p)ll3 and v = {a 


p) //3. Note that both u and v are positive in the domain of interest. We next write 


Rx\y{D) > h{(3) +p\og (1 - e-™) + /31og (l - e"™) 

- log (l-rp-p) (17) 

= HP) -gir). 


In order to maximize the RHS of (17), we take the derivative of g{r) which gives 


g'{r) =-{D-p) + 


U + V 


Pu 


Pv 


gr*(«+ii) _ 2^ gr*« _ 


- 1 


(18) 


It can be shown that limr_>.oo g'H) = p — D < 0, limr_>o+ g'ij) = 1/2 —77 > 0 and that g'{r) is non-decreasing for r > 0. All 
this implies that the maximum of g{r) is obtained at a single point r* in the domain r > 0 which corresponds to g'{r*) = 0. 
We conclude that this r* maximizes the RHS of (17). 

It is shown in Appendix A that for p < a < 1/2 and r = r*, there exist positive wo and wi that satisfy (11). This implies 
that substituting r* in (17) leads to equality, i.e., the iRDF is given by the RHS of (17). ■ 

In the special case where a = 1/2 and p < a, we have that P = 1/2 and (16) reduces to 


jp-p) _ 2(p-p) 

gr(p-p) _ g2r(p-p) _ I 


(19) 


which leads to 



p-p 

Substituting r* in (15) results in g{r*) = h{A), where 

aAa{d,p) a 

p-p 

It follows from Theorem III.2 that 


RxiviD) 


log(2) -(A), p<D<l/2, 

0 , £>> 1 / 2 , 


( 20 ) 


which is equivalent to [2, Exc. 3.8]. Equation (20) has a similar form as the direct RDE (13) of a binary i.i.d symmetric 
process. It is interesting to compare (20) to (13) and to observe how the properties of Rx\y anticipated in Proposition III.l 
are expressed in the special case of (20). 

(i) D = 1/2 corresponds to /i(A) = h(l/2) = log(2). 

(ii) The domain of Rx\y{D) is 0 < A or p < 77. 

(iii) A is decreasing in p and therefore Rx\YiD) is increasing in p. 

(iv) (20) reduces to (13) for p = 0. 

The slope of Rx\y{D) is an important parameter since it determines the maximal return in code-rate reduction for each 
additional distortion unit the system can tolerate. In the range p < D <1/2, this slope is given by 


p-p 


■log 


p — D 
D — p 


( 21 ) 


Note that this slope is more steep than the slope of Rx{D), and goes to infinity as p approaches 1/2 (see Eig. 3). This 
fact confirms the intuition that an increment in the bit-rate when describing noisy measurements is less effective in reducing 
distortion as the intensity of the noise increases. 

Another interesting factor is the rate at which Rx\y{D) changes with p for a fixed p < 77 < a < 1/2. This rate represents 
the amount of excess coding needed as a result of increasing uncertainty on the source in order to keep a fixed distortion. 


Due to the similarity between (20) and (13), it may be tempting to guess that Rx\y{D) is given in a similar form to (20) 
even in the case where a < 1/2. While an exact solution of (16) is hard to obtain in general, it is possible to obtain the 
following bound. 

Theorem III.3. For any p,a € [0, 1] and D > p, 

Rx\y{D) < HP) ~ h (A), 


where A = (77 — p)/(l — 2p). 


( 22 ) 




Fig. 3: Rx\y{D) for a = 1/2 and various values of 0 < p < 1/2 that correspond to the vertical dashed lines. 



Fig. 4; Rx\y{D), Rx{D) and the upper bound (22) for a = 1/4 and p = 0.05. 


Proof: The proof is provided in App. B. ■ 

The bound in Theorem III.3 is illustrated in Figure 4. The fact that (22) is not tight in general can be easily seen since 
A > P at D = a for a 7 ^ 1/2. In fact, due to the convexity of Rx\y{D), a better bound can be obtained by adding the point 
i?x|r(ct) = 0 to the bounding curve and taking the convex closure, as illustrated by the dashed line in Figure 4. 

In view of Theorems II. 1, III.2 and III.3, the results in this paper can be summarized by the following statement. For 
p < D < a and any r > 0 we have 

HP) - air) < Rx\YiD) < h{P) - /i(A), (23) 

where the LHS holds with equality if and only if r satishes (16), and the RHS holds with equality if and only if a = 1/2. 

IV. Conclusions 

This paper studies the indirect rate-distortion problem for a binary i.i.d. source under the Hamming distortion given its noisy 
observation through a binary symmetric channel. The indirect rate distortion problem is an extension of the rate distortion 
problem in which the encoder is provided with a noisy observation of the source sequence. We investigate the rate-distortion 
tradeoff for the simple scenario of a binary source, bit flipping noise and Hamming distortion. Although conceptually simple, 
this model provides a number of key intuitions on more general models and illustrates important tradeoffs for practical systems. 
For instance, by deriving the relationship between rate and distortion at each noise level, we make it possible to determine 
how the sampling error and the communication error probabilities can be balanced in a remote sensor to obtain a desired target 
end-to-end quality of measurement. 












Appendix A 


In this Appendix we complete the proof of Theorem III.2 by showing the existence of positive wq and wi that satisfy (11). 
From the expression to /o and /i we obtain: 


Wo = P- 


—rZ£. ’ 

e p - e p 


Wl = P -55-J 

— r-7T- —r- 

e — e 




We need to show that (24) and (25) are positive for any p < a < \/2 and r = r*. The case where a = 1/2 were treated 
above and leads to wq = Wi = 1/2. If p = a, then it follows from Proposition III.l that i?x|y(^) is defined only for D > a 
and equals zero. We will therefore assume p < D < a < 1/2. Another way to write (24) and (25) is 


wo{r) = 


wi{r) = 


j3 




Since m > 0, f > 0 and p — p > 0 in the domain of interest, it can be shown that limr^oo wo (f) = P and that the derivative 
of wo(r) is negative for any r > 0. This implies that wo(r) > 0 for all values of r in the domain of interest and in particular 
at r = r*. 

For Wl we can show that limr_i.o+ Wi(r) = —oo, limr_s.oo (i') = P and it is monotonically increasing for r > 0. By 
continuity of Wi(r), it follows that there exists tq > 0 with Wi(ro) = 0 such that Wi(r) < 0 whenever r < tq and Wi(r) > 0 
whenever r > tq . Since we have seen in the proof of Theorem III.2 that g'{r) has similar behavior with a unique root r*, we 
conclude that if g'{ro) < 0, then r* > rg and then wi(r*) > 0. It is therefore enough to show that g'{ro) < 0. Indeed, at 
r = rg we have 


Substituting that in the expression for g'{r) we obtain 


Kp) = gpP = ro) = -D + p + 




r(u-\-v) _ ^ ^ru _ ^ 


gri/ _ I 


Define 


a(r) = -D + P + 


u + V Pu Pv 

,r{u-\-v) _ ^ ^ru _ ^ ^ru _ ^ ' 


^(p-p)-u> 0 , 

PP 

we have that a{r) > g'(r = rg) for all r > 0. In addition, limr_>.oo a(f) = —D + p < 0 and 

,, , , , / u + v up \ 

a (r) = (« + !;)- 5 - H- 5 - , 

V (e’'(“+-) - 1)" (e™-l)7 

which is positive for all r > 0. We conclude that b{r) < air) < 0 for all r > 0. This proves the claim. 


Appendix B 

Proof of Th. III.3 

It is enough to assume that p < a < 1/2. For a = 1/2 we have 


g{r) = log (1 - - log (l - . 



We will show that for all r > 0, the difference between g{r) that corresponds to any p < D < 1/2 and the one that corresponds 
to a = 1/2 is always positive. This difference can be written as 


S{r) 4 /31og ^ 


I _ g-r(ii+i>) 


(1 -e- 

/ 

13 log 


") (l + e-^lP-Pl) 

_ ^-r(u+v) 


The result follows by noting that limr 


(1 - e-™) (1 + e-^lP-Pl) J ' 

,S(r) = 0 and the derivative of S(r) is strictly positive for any r > 0. 


(27) 
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